Search CORE

35 research outputs found

ENCOPLOT: Experiments and Results in Automatic Plagiarism Detection

Author: Grozea Cristian
Publication venue
Publication date
Field of study

Optimising Rolling Stock Planning including Maintenance with Constraint Programming and Quantum Annealing

Author: Bickert Patricia
Grozea Cristian
Hans Ronny
Koch Matthias
Riehn Christina
Wolf Armin
Publication venue
Publication date: 25/09/2023
Field of study

We propose and compare Constraint Programming (CP) and Quantum Annealing (QA) approaches for rolling stock assignment optimisation considering necessary maintenance tasks. In the CP approach, we model the problem with an Alldifferent constraint, extensions of the Element constraint, and logical implications, among others. For the QA approach, we develop a quadratic unconstrained binary optimisation (QUBO) model. For evaluation, we use data sets based on real data from Deutsche Bahn and run the QA approach on real quantum computers from D-Wave. Classical computers are used to evaluate the CP approach as well as tabu search for the QUBO model. At the current development stage of the physical quantum annealers, we find that both approaches tend to produce comparable results

arXiv.org e-Print Archive

Findings of the WMT 2017 Biomedical Translation Shared Task

Author: Bojar Ondrej
Boyer Arthur
Grozea Cristian
Haddow Barry
Jimeno Yepes Antonio
Kittner Madeleine
Lichtblau Yvonne
Neveol Aurelie
Neves Mariana
Pecina Pavel
Roller Roland
Rosa Rudolf
Siu Amy
Thomas Philippe
Trescher Saskia
Verspoor Karin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Automatic translation of documents is an important task in many domains, including the biological and clinical domains. The second edition of the Biomedical Translation task in the Conference of Machine Translation focused on the automatic translation of biomedical-related documents between English and various European languages. This year, we addressed ten languages: Czech, German, English, French, Hungarian, Polish, Portuguese, Spanish, Romanian and Swedish. Test sets included both scientific publications (from the Scielo and EDP Sciences databases) and health-related news (from the Cochrane and UK National Health Service web sites). Seven teams participated in the task, submitting a total of 82 runs. Herein we describe the test sets, participating systems and results of both the automatic and manual evaluation of the translations

Crossref

Fraunhofer-ePrints

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

Challenges in Representation Learning: A report on three machine learning contests

Author: Athanasakis Dimitris
Bengio Yoshua
Bergstra James
Carrier Pierre Luc
Chuang Zhang
Courville Aaron
Cukierski Will
Erhan Dumitru
Feng Fangxiang
Goodfellow Ian J.
Grozea Cristian
Hamner Ben
Ionescu Radu
Lee Dong-Hyun
Li Ruifan
Milakov Maxim
Mirza Mehdi
Park John
Popescu Marius
Ramaiah Chetan
Romaszko Lukasz
Shawe-Taylor John
Tang Yichuan
Thaler David
Wang Xiaojie
Xie Jingjing
Xu Bing
Zhou Yingbo
Publication venue
Publication date: 01/01/2013
Field of study

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

Correlation of velocity and susceptibility in patients with aneurysmal subarachnoid hemorrhage

Author: Dahlem Markus
Dreier Jens P.
Friedman Alon
Grozea Cristian
Hartings Jed A.
Kola Vasilis
Lublinsky Svetlana
Lückl Janos
Major Sebastian
Martus Peter
Milakara Denny
Scheel Michael
Schoknecht Karl
Winkler Maren K. L.
Woitzik Johannes
Publication venue
Publication date: 01/01/2017
Field of study

In many cerebral grey matter structures including the neocortex, spreading depolarization (SD) is the principal mechanism of the near-complete breakdown of the transcellular ion gradients with abrupt water influx into neurons. Accordingly, SDs are abundantly recorded in patients with traumatic brain injury, spontaneous intracerebral hemorrhage, aneurysmal subarachnoid hemorrhage (aSAH) and malignant hemispheric stroke using subdural electrode strips. SD is observed as a large slow potential change, spreading in the cortex at velocities between 2 and 9 mm/min. Velocity and SD susceptibility typically correlate positively in various animal models. In patients monitored in neurocritical care, the Co-Operative Studies on Brain Injury Depolarizations (COSBID) recommends several variables to quantify SD occurrence and susceptibility, although accurate measures of SD velocity have not been possible. Therefore, we developed an algorithm to estimate SD velocities based on reconstructing SD trajectories of the wave-front's curvature center from magnetic resonance imaging scans and time-of-SD-arrival- differences between subdural electrode pairs. We then correlated variables indicating SD susceptibility with algorithm-estimated SD velocities in twelve aSAH patients. Highly significant correlations supported the algorithm's validity. The trajectory search failed significantly more often for SDs recorded directly over emerging focal brain lesions suggesting in humans similar to animals that the complexity of SD propagation paths increase in tissue undergoing injury

Institutional Repository of the Freie Universität Berlin

Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies

Author: Bawden Rachel
Bretonnel Cohen Kevin
Grozea Cristian
Jimeno Yepes Antonio
Kittner Madeleine
Krallinger Martin
Mah Nancy
Neves Mariana
Névéol Aurélie
Siu Amy
Soares Felipe
Verspoor Karin
Vicente Navarro Maika
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports

Author: Bawden Rachel
Di Nunzio Giorgio Maria
Farré-Maduell Eulàlia
Grozea Cristian
Gérardin Christel
Jimeno Yepes Antonio
Johan Estrada Darryl
Krallinger Martin
Lima-López Salvador
Neves Mariana
Névéol Aurélie
Roller Roland
Siu Amy
Thomas Philippe
Vezzani Federica
Vicente Navarro Maika
Wiemann Dina
Yeganova Lana
Publication venue: HAL CCSD
Publication date: 07/12/2022
Field of study

International audienceIn the seventh edition of the WMT Biomedical Task, we addressed a total of seven language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This year’s test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previ- ous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams

INRIA a CCSD electronic archive server

Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection

Author: Alberto Barrón-Cedeño
Barrón-Cedeño Alberto
Chomsky Noam
Clough Paul
Comas Rubén
Dolan William B.
Dorr Bonnie J.
Dutrey Camille
España-Bonet Cristina
Grozea Cristian
Levin Beth
M. Martí
MacQueen J. B.
Marta Vila
Martin Brian
Maurer Hermann
Max Aurélien
Mel'čuk Igor A.
Milićević Jasmina
Paolo Rosso
Potthast Martin
Potthast Martin
Potthast Martin
Stamatatos Efstathios
Stein Benno
Talmy Leonard
Vila Marta
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2013
Field of study

[EN] Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.We would like to thank the people who participated in the annotation of the P4P corpus, Horacio Rodriguez for his helpful advice as experienced researcher, and the reviewers of this contribution for their valuable comments to improve this article. This research work was partially carried out during the tenure of an ERCIM "Alain Bensoussan" Fellowship Programme. The research leading to these results received funding from the EU FP7 Programme 2007-2013 (grant no. 246016), the MICINN projects TEXT-ENTERPRISE 2.0 and TEXT-KNOWLEDGE 2.0 (TIN2009-13391), the EC WIQ-EI IRSES project (grant no. 269180), and the FP7 Marie Curie People Programme. The research work of A. Barron-Cedeno and M. Vila was financed by the CONACyT-Mexico 192021 grant and the MECD-Spain FPU AP2008-02185 grant, respectively. The research work of A. Barron-Cedeno was partially done in the framework of his Ph.D. at the Universitat Politecnica de Valencia.Barrón Cedeño, LA.; Vila, M.; Martí, MA.; Rosso, P. (2013). Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Computational Linguistics. 39(4):917-947. https://doi.org/10.1162/COLI_a_00153S917947394Barzilay, Regina. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York.Barzilay, R., & Lee, L. (2003). Learning to paraphrase. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL ’03. doi:10.3115/1073445.1073448Barzilay, Regina and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), pages 50–57, Toulouse.Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -. doi:10.3115/1034678.1034760Bhagat, Rahul. 2009. Learning Paraphrases from Text. Ph.D. thesis, University of Southern California, Los Angeles.Cheung, Mei Ling Lisa. 2009. Merging Corpus Linguistics and Collaborative Knowledge Construction. Ph.D. thesis, University of Birmingham, Birmingham.Cohn, T., Callison-Burch, C., & Lapata, M. (2008). Constructing Corpora for the Development and Evaluation of Paraphrase Systems. Computational Linguistics, 34(4), 597-614. doi:10.1162/coli.08-003-r1-07-044Dras, Mark. 1999. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. Ph.D. thesis, Macquarie University, Sydney.Faigley, L., & Witte, S. (1981). Analyzing Revision. College Composition and Communication, 32(4), 400. doi:10.2307/356602Fujita, Atsushi. 2005. Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases. Ph.D. thesis, Nara Institute of Science and Technology, Nara.Grozea, C., & Popescu, M. (2010). Who’s the Thief? Automatic Detection of the Direction of Plagiarism. Lecture Notes in Computer Science, 700-710. doi:10.1007/978-3-642-12116-6_59GÜLICH, E. (2003). Conversational Techniques Used in Transferring Knowledge between Medical Experts and Non-experts. Discourse Studies, 5(2), 235-263. doi:10.1177/1461445603005002005Harris, Z. S. (1957). Co-Occurrence and Transformation in Linguistic Structure. Language, 33(3), 283. doi:10.2307/411155KETCHEN Jr., D. J., & SHOOK, C. L. (1996). THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE. Strategic Management Journal, 17(6), 441-458. doi:10.1002/(sici)1097-0266(199606)17:63.0.co;2-gMcCarthy, D., & Navigli, R. (2009). The English lexical substitution task. Language Resources and Evaluation, 43(2), 139-159. doi:10.1007/s10579-009-9084-1Recasens, M., & Vila, M. (2010). On Paraphrase and Coreference. Computational Linguistics, 36(4), 639-647. doi:10.1162/coli_a_00014Shimohata, Mitsuo. 2004. Acquiring Paraphrases from Corpora and Its Application to Machine Translation. Ph.D. thesis, Nara Institute of Science and Technology, Nara.Stein, B., Potthast, M., Rosso, P., Barrón-Cedeño, A., Stamatatos, E., & Koppel, M. (2011). Fourth international workshop on uncovering plagiarism, authorship, and social software misuse. ACM SIGIR Forum, 45(1), 45. doi:10.1145/1988852.198886

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Plagiarism Detection with State of the Art Compression Programs

Author: Cristian Grozea
Cristian Grozea
Publication venue
Publication date
Field of study

This note documents a new approach to plagiarism detection, based on Algorithmic Information Theory. It uses results from the author’s Ph.D. Thesis

CiteSeerX

Free-Extendible Prefix-Free Sets and an Extension of the Kraft-Chaitin Theorem

Author: Grozea Cristian
Publication venue: Journal of Universal Computer Science
Publication date: 01/01/2000
Field of study

First, the dual set of a finite prefix-free set is defined. Using this notion we describe equivalent conditions for a finite prefix-free set to be indefinitely extendible. This lead to a simple proof for the Kraft-Chaitin Theorem. Finally, we discuss the influence of the alphabet size on the indefinite extensibility property. 1 C.S.Calude and G.Stefanescu (eds.). Automata, Logic, and Computability. Special issue dedicated to Professor Sergiu Rudeanu Festschrift

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints